AITopics | numerical feature

Collaborating Authors

numerical feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

146b4bab3f8536a07905f25d367b4924-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 17:51:05 GMT

accuracy, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

d3bcbcb2a7b0b4716bf24ce4b2ea8d60-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-12-2026, 03:17:07 GMT

batch, dataset, roc-auc, (15 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.08)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Security & Privacy (0.69)

Add feedback

OnEmbeddingsforNumericalFeatures inTabularDeepLearning

Neural Information Processing SystemsFeb-11-2026, 01:31:00 GMT

Unlike traditional models, e.g., MLP,these architectures mapscalar valuesofnumerical features tohigh-dimensional embeddings before mixing them inthemain backbone.

artificial intelligence, machine learning, numerical feature, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

7428310c0f97f1c6bb2ef1be99c1ec2a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 20:12:35 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Minnesota (0.04)
(2 more...)

Genre: Research Report > New Finding (0.47)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

146b4bab3f8536a07905f25d367b4924-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 14:24:29 GMT

accuracy, dataset, numerical feature, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Cascaded Flow Matching for Heterogeneous Tabular Data with Mixed-Type Features

Mueller, Markus, Gruber, Kathrin, Fok, Dennis

arXiv.org Machine LearningFeb-2-2026

Advances in generative modeling have recently been adapted to tabular data containing discrete and continuous features. However, generating mixed-type features that combine discrete states with an otherwise continuous distribution in a single feature remains challenging. We advance the state-of-the-art in diffusion models for tabular data with a cascaded approach. We first generate a low-resolution version of a tabular data row, that is, the collection of the purely categorical features and a coarse categorical representation of numerical features. Next, this information is leveraged in the high-resolution flow matching model via a novel guided conditional probability path and data-dependent coupling. The low-resolution representation of numerical features explicitly accounts for discrete outcomes, such as missing or inflated values, and therewith enables a more faithful generation of mixed-type features. We formally prove that this cascade tightens the transport cost bound. The results indicate that our model generates significantly more realistic samples and captures distributional details more accurately, for example, the detection score increases by 40%.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Machine Learning

2601.22816

Country:

Asia > China > Beijing > Beijing (0.05)
Europe > Netherlands > South Holland > Rotterdam (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

On Embeddings for Numerical Features in Tabular Deep Learning

Neural Information Processing SystemsDec-24-2025, 21:28:47 GMT

Recently, Transformer-like deep architectures have shown strong performance on tabular data problems. Unlike traditional models, e.g., MLP, these architectures map scalar values of numerical features to high-dimensional embeddings before mixing them in the main backbone. In this work, we argue that embeddings for numerical features are an underexplored degree of freedom in tabular DL, which allows constructing more powerful DL models and competing with gradient boosted decision trees (GBDT) on some GBDT-friendly benchmarks (that is, where GBDT outperforms conventional DL models). We start by describing two conceptually different approaches to building embedding modules: the first one is based on a piecewise linear encoding of scalar values, and the second one utilizes periodic activations. Then, we empirically demonstrate that these two approaches can lead to significant performance boosts compared to the embeddings based on conventional blocks such as linear layers and ReLU activations. Importantly, we also show that embedding numerical features is beneficial for many backbones, not only for Transformers. Specifically, after proper embeddings, simple MLP-like models can perform on par with the attention-based architectures. Overall, we highlight embeddings for numerical features as an important design aspect with good potential for further improvements in tabular DL.

name change, numerical feature, tabular deep learning, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.39)

Add feedback

Private Synthetic Data for Multitask Learning and Marginal Queries

Neural Information Processing SystemsDec-24-2025, 11:19:54 GMT

We provide a differentially private algorithm for producing synthetic data simultaneously useful for multiple tasks: marginal queries and multitask machine learning (ML). A key innovation in our algorithm is the ability to directly handle numerical features, in contrast to a number of related prior approaches which require numerical features to be first converted into {high cardinality} categorical features via {a binning strategy}. Higher binning granularity is required for better accuracy, but this negatively impacts scalability. Eliminating the need for binning allows us to produce synthetic data preserving large numbers of statistical queries such as marginals on numerical features, and class conditional linear threshold queries. Preserving the latter means that the fraction of points of each class label above a particular half-space is roughly the same in both the real and synthetic data. This is the property that is needed to train a linear classifier in a multitask setting. Our algorithm also allows us to produce high quality synthetic data for mixed marginal queries, that combine both categorical and numerical features. Our method consistently runs 2-5x faster than the best comparable techniques, and provides significant accuracy improvements in both marginal queries and linear prediction tasks for mixed-type datasets.

multitask learning and marginal query, numerical feature, private synthetic data, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.97)

Add feedback

Mixed Data Clustering Survey and Challenges

Guerard, Guillaume, Djebali, Sonia

arXiv.org Artificial IntelligenceDec-4-2025

This paradigm challenges traditional data management and analysis techniques by demanding innovative solutions capable of processing, analyzing, and deriving insights from vast and diverse datasets. In particular, the inclusion of mixed data types, such as numerical and categorical variables, poses significant challenges to conventional methodologies, necessitating the development of novel approaches to effectively leverage the wealth of information available [2]. Traditionally, data handling methods were designed around homogeneous datasets, typically consisting of numerical values. However, the big data paradigm introduces a multitude of data types, including structured, unstructured, and semi-structured data, which demand a departure from traditional approaches. Moreover, the three primary characteristics of big data--volume, velocity, and variety--amplify the complexity of data analysis, requiring scalable and adaptable solutions capable of processing large volumes of data at high speeds while accommodating diverse data formats and structures. These methods for handling mixed data often involve separate analyses of categorical and numerical variables, treating them as distinct entities rather than integrating their interdependencies.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s42979-025-04439-7

2512.0307

Country: Asia (0.28)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Real-PGDN: A Two-level Classification Method for Full-Process Recognition of Newly Registered Pornographic and Gambling Domain Names

Wang, Hao, Wang, Yingshuo, Gan, Junang, Cheng, Yanan, Zhang, Jinshuai

arXiv.org Artificial IntelligenceDec-1-2025

Online pornography and gambling have consistently posed regulatory challenges for governments, threatening both personal assets and privacy. Therefore, it is imperative to research the classification of the newly registered Pornographic and Gambling Domain Names (PGDN). However, scholarly investigation into this topic is limited. Previous efforts in PGDN classification pursue high accuracy using ideal sample data, while others employ up-to-date data from real-world scenarios but achieve lower classification accuracy. This paper introduces the Real-PGDN method, which accomplishes a complete process of timely and comprehensive real-data crawling, feature extraction with feature-missing tolerance, precise PGDN classification, and assessment of application effects in actual scenarios. Our two-level classifier, which integrates CoSENT (BERT-based), Multilayer Perceptron (MLP), and traditional classification algorithms, achieves a 97.88% precision. The research process amasses the NRD2024 dataset, which contains continuous detection information over 20 days for 1,500,000 newly registered domain names across 6 directions. Results from our case study demonstrate that this method also maintains a forecast precision of over 70% for PGDN that are delayed in usage after registration.

artificial intelligence, domain name, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2511.22215

Country: Asia > China > Heilongjiang Province (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback